\section{Our technique}
\label{sec-our-technique}

\subsection{Two versions of every function body}
\label{sec-two-body-versions}

We provide two different versions of every function body%
\footnote{This idea was suggested by Michael Raskin.}.
One version, called the \emph{ordinary} version, and the other one is
called the \emph{debugging} version.  Each version is provided as a
separate \emph{entry point} for the function%
\footnote{This idea was suggested by Frode Fjeld.}.
The two versions are \emph{similar} (but not
identical) copies of the entire function body.

By including both versions in the same function, we make it
unnecessary for the application programmer to recompile the code with
higher debug settings when it is desirable to have more debugging
information than what the compiler would generate by default.

The ordinary function body is compiled using every typical optimization
technique used by a good compiler, including:

\begin{itemize}
\item constant folding,
\item dead code elimination,
\item common sub-expression elimination,
\item loop-invariant code motion,
\item induction-variable optimization, 
\item elimination of in-scope dead variables, and
\item tail-call optimization.
\end{itemize}

\noindent
Some of these optimization techniques are essential for
high-performance code, but many of them can make it significantly
harder for the user to understand what the program is doing:

\begin{itemize}
\item Common sub-expression elimination and similar techniques for
  redundancy elimination may make it impossible to set
  a breakpoint in some part of the code, simply because that code has
  been eliminated by the compiler.
\item When a variable is used for the last time, the compiler
  typically reuses the place that it occupies for other purposes, even
  though the variable may still be in scope.  This optimization makes
  it impossible for the user to examine the value of
  a variable that has been eliminated.  A user with
  a poor understanding of compiler-optimization techniques will find
  the result surprising.
\item Loop-invariant code motion results in code being moved from
  inside a loop to outside it.  Any attempt by the application
  programmer to set a breakpoint in such code will fail.
\item Induction-variable optimization will eliminate or replace
  variables in source code by others that are more beneficial for
  the performance of the computation, again making it harder for the
  user to debug the code.
\item Tail-call optimization exploits the fact that the stack needs to
  reflect only the \emph{future} of the computation, but the
  \emph{past} can be omitted.  For the purpose of debugging, the past
  of the computation provides essential information to the developer.
\end{itemize}

\noindent
To avoid many of these inconveniences to the user,
the debugging version of the function body is compiled in a way that
makes the code somewhat slower, but much more friendly for the purpose
of debugging.  Some of the optimization techniques cited above will
not be performed at all, or only in a less ``aggressive'' form.
Messages from the compiler (such as when dead code is eliminated) are
emitted based on the compilation of the normal version of the function
body so that the maximum amount feedback is obtained.

Notice that the existence of the two versions of the function body
makes it usually unnecessary for the programmer to explicitly indicate
values of the \texttt{debug} optimize quality.  In fact, forcing the
programmer to set this value is often a major inconvenience, since it
typically requires the programmer to choose between debugging
convenience and execution speed.  The \texttt{speed} and
\texttt{compilation-speed} optimize qualities may influence
compilation of the the normal version of the body, but not the
debugging version.  The debugging version will in general be much
faster to generate because of the absence of most optimization passes.

Furthermore, the debugging version of the code is compiled so that a
small routine is called immediately before and immediately after the
execution corresponding to the evaluation of a form in the source
code.  In order to determine whether a breakpoint is present at that
particular location in the source code, this routine performs a query
of a table managed by the debugger.  While the details of how this
table is implemented might evolve, here we give one example of such an
implementation, thereby arguing that performing a query is not overly
expensive in terms of performance.

As an example of implementation of this table, it might be split into
two sub-tables called the \emph{summary table} and the
\emph{breakpoint table}.  Both these tables are managed by the
debugger, in that actions on the part of the user may alter their
contents.  The application consults these tables, directly or
indirectly, to determine whether a breakpoint is present.

The purpose of the summary table is to provide a quick test that
almost always indicates that no breakpoint is present.  Thus, the
summary table is a fixed-size bit vector.  The size will typically be
a small power of $2$, for instance $1024$ which represents a modest 16
64-bit words on a modern processor.  The application routine computes
the value of the program counter modulo the size of the table in order
to determine an index.  If the entry in the table contains $0$, then
there is definitely not a breakpoint present at the source location in
question.  Since there are typically only a modest number of
breakpoints in a program, most of the time, the entry will contain a
$0$, making the routine return immediately, and normal form evaluation
to continue.  The debugging version of the function body accesses this
table early on in order to create a reference to it in a lexical
variable.  This lexical variable is subject to register allocation as
usual.

If the entry in the summary table contains $1$, then there is a
breakpoint at \emph{some} value of the program counter that, when
taken modulo the size of the table, has a breakpoint present.  If this
is the case, then the routine consults the breakpoint table.  In other
words, the summary table acts as a Bloom filter
\cite{10.1145:362686.362692}, in that false positives are possible,
but false negatives are not.  The size of the table determines the
probability of a false positive.

The breakpoint table is a hash table in which the keys are values of
the program counter%
\footnote{In implementations where code can be moved by the garbage
  collector, this table must be re-hashed after a collection.  The
  tentative decision for \sicl{} is to have all code at fixed locations.}
and the values are objects that the debugger uses in order to
determine information about the breakpoint in question.  When the
routine finds that a breakpoint is present at the current source
location, it informs the debugger.  Details of the communication
between the application thread and the debugger are discussed in
\refSec{sec-debugger-application-communication}.

In the ordinary version of the function body, when a function call is
made, the caller uses the entry point of the callee corresponding to
the ordinary version of the body of the callee.  In the debugging
version of the function body, on the other hand, when a function call
is made, the caller uses the entry point of the callee corresponding
to the debugging version of the body of the callee.  This mechanism
thus automatically propagates the information about debugging
throughout the call chain.

\subsection{Communication between the debugger and the application}
\label{sec-debugger-application-communication}

Debuggers in \unix{} systems have full access to the address space of
the application, including the stack and the lexical variables.  A
\unix{} debugger can therefore modify any lexical variable and then
continue the execution of the application.  Such manipulations may
very well violate some of the assumptions made by the compiler for a
particular code fragment.  For example, if the code contains a test
for the value of a numeric variable, the compiler may make different
assumptions about this value in the two different branches executed as
a result of the test.

Allowing a debugger to make arbitrary modifications to lexical
variables, let alone to any memory location, in a \commonlisp{}
application program will defeat any attempts at making the system
safe, and safety is one of the objectives of the \sicl{} system as
expressed in \refSec{sec-sicl-features}.  We must therefore come up
with a different communication protocol that keeps the system safe.

Our design contains two essential elements for this purpose:

\begin{enumerate}
\item The debugger consists of an interactive application with a
  \emph{command loop}.  An iteration of this command loop can of
  course be prompted by a user interaction.  However, when the
  application detects a breakpoint by querying the tables described in
  \refSec{sec-two-body-versions}, it injects a command into the
  command loop of the debugger, triggering the execution of code in
  the debugger to handle the breakpoint.
\item A shared \emph{queue} is used to send messages from the debugger
  to the application.  This queue has a \emph{semaphore} associated
  with it.
\item Once the application has informed the debugger about a
  breakpoint, it attempts to \emph{dequeue} the next message on the
  queue.  If the queue is empty, the application automatically
  \emph{waits} on the associated semaphore, until the debugger issues
  an \emph{enqueue} operation with instructions for the application.
\end{enumerate}

The debugger is in charge of taking into account the commands issued
by the user.  When the user indicates that a certain action should be
performed at a particular place in the source code, the debugger
populates the two tables mentioned in \refSec{sec-two-body-versions},
and records the particular action the user desires, for example:

\begin{figure}
\begin{center}
\inputfig{fig-communication.pdf_t}
\end{center}
\caption{\label{fig-communication}
Communication between user, application, and debugger.}
\end{figure}

\begin{itemize}
\item The user might indicate that the application should stop and
  wait for further actions by the user, after the user has examined
  the state of application data.  In this case, the debugger records
  this desire in the breakpoint table.  When the application then
  reaches the breakpoint in question, the debugger waits for further
  user action in its command loop.
\item After the user has examined the state of application data as a
  result of the application having stopped, the user can issue a
  command that makes the application continue normal execution.  The
  debugger then immediately sends a message to this effect to the
  application.
\item The user might indicate that a \emph{trace message} should be
  printed without stopping the application.  Then, when the breakpoint
  is reached, the debugger displays the message to the user and sends
  a message to the application to continue execution.
\item The user can also indicate that the execution of the application
  should be \emph{stepped} in one of several different ways:
  \begin{itemize}
  \item \emph{next}: Execution stops at the next possible
    program point.
  \item \emph{in}: Execution stops at the beginning of a
    function being called.
  \item \emph{out}: The remaining sub-forms of the form containing the
    current breakpoint are evaluated, and execution stops immediately
    after the evaluation of that form.
  \item \emph{over}: When a breakpoint is reached that is located
    immediately before a form is evaluated, the form is evaluated and
    execution stops immediately after this evaluation.
  \item \emph{finish}:  Execution of the currently executing function
    terminates, and stops in the calling function
    immediately after the call.
  \end{itemize}
  The debugger then sets one or more \emph{volatile
    breakpoints} (i.e., breakpoints that will be removed once reached)
  at source locations corresponding to the type of stepping required.
  It then instructs the application to continue execution as usual.
\end{itemize}

To allow for the user to examine the state of the
application, when the application thread detects a breakpoint, the
command it injects into the debugger command loop contains a complete
list of live local variables and their values, as well as of special
variables bound in the application thread.

Since we intend to provide debugger commands for examining and
modifying application data, we must make sure that any such
manipulation on the part of the user preserves the integrity of the
application.

In particular, any assumptions made by the compiler about the
structure or type of some lexical variable must be impossible to
violate through the modification of the value of a lexical variable.
We obtain this property by making sure that the compiler does not
propagate any information about the structure or type of lexical
variables between program points that admit breakpoints.  Thus, any
run-time manipulation that requires this structure or type to be known
must be preceded by an explicit test, and the compiler does not
generate code that admits a breakpoint between the test and the
manipulation.

\subsection{Debugger commands available to the user}

We have an embryonic implementation of an interactive debugger, called
\clordane{}.%
\footnote{The name Clordane is a deliberate misspelling of
  ``Chlordane'' which is a pesticide that was banned in most countries
  in the 1980s.  The misspelling was designed to suggest the
  \commonlisp{} language and to make answers by search engines less
  cluttered.}
We use the \mcclim{} library for writing graphic user
interfaces as a basis for this tool.  Currently, \clordane{} can show
the source code of an application (one source file at a time) and an
indication of the place of a breakpoint.  The application being
debugged is then compiled with the \sicl{} compiler, generating
high-level intermediate representation (HIR).  The HIR code is then
interpreted by a small program running in a host \commonlisp{}
implementation.

The communication protocol described in
\refSec{sec-debugger-application-communication} has been implemented
and works to our satisfaction, but only a small subset of interactions
have been implemented so far.

We think that the following commands must be implemented in a fully
featured debugger:

\begin{itemize}
\item The user should be able to point to a location
  in the source code to indicate a particular action to be taken at
  that point:
  \begin{itemize}
  \item Stop the execution of the application and wait for further
    actions.
  \item Print a trace message, possibly containing the values of live
    variables, and then continue the execution. 
  \end{itemize}
  It should be possible to make the action conditional, based on some
  arbitrary expression to be evaluated in the debugger thread.  This
  expression can contain references to live variables in the
  application.
\item When the application is stopped, the user
  should be able to examine live variables, and (in some cases, with
  restrictions) modify their values.
\item Also, when the application is stopped, the application
  programmer should be able to issue one of several types of
  \emph{stepping} commands, implicitly indicating the next location
  for the application to stop.
\end{itemize}
